Effect of correlations on network controllability 
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Abstract 

A dynamical system is controllable if by imposing appropriate external signals on a subset of 

^_ its nodes, it can be driven from any initial state to any desired state in finite time. Here we study 

«vj . the impact of various network characteristics on the minimal number of driver nodes required to 

control a network. We find that clustering and modularity have no discernible impact, but the 

>v>i ■ symmetries of the underlying matching problem can produce linear, quadratic or no dependence on 

C^ . degree correlation coefficients, depending on the nature of the underlying correlations. The results 

are supported by numerical simulations and help narrow the observed gap between the predicted 

and the observed number of driver nodes in real networks. 
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While during the past decade significant efforts have been devoted to understanding the 
structure, evolution and dynamics of complex networks [l|-l6|, only recently has attention 
turned to an equally important problem: our ability to control them. Given the problem's 
importance, recent work has extended the concept of pinning control J7H9| and structural 



controllability 10l-|l3| to complex networks. Here we focus on the latter approach. A net- 



worked system is considered controllable if by imposing appropriate external signals on a 
subset of its components, called driver nodes, the system can be driven from any initial state 



14 



to any final state in finite time 
description of the governing dynamical ru 
gineered systems. Yet, recently Liu et al. 



17j. As the control of a system requires a quantitative 
es, progress in this area was limited to small en- 



10| showed that the identification of the minimal 



number of driver nodes required to control a network, N^, can be derived from the network 
topology by mapping controllability 16| to the maximum matching in directed networks |l8|. 



(k- k 



out J 



The mapping indicated that A^d is mainly determined by the degree distribution P( 
We know, however, that a series of characteristics, from degree correlations 19l-l2l| to lo- 
cal clustering [22| and communities |23|-|26|, cannot be accounted for by P{kin,kout) alone, 
prompting us to ask: which network characteristics affect the system's controllability? 



The three most commonly studied deviations from the random network configuration 
are (i) clustering, manifested as a higher clustering coefficient C than expected based on 
the degree distribution 27|]; (ii) community structure, representing the agglomeration of 
nodes into distinct communities, captured by the modularity parameter Q [25|; (iii) degree 



correlations 



|. In Sec. II Al we motivate our work by showing that network characteristics 
other than the degree distribution also affect network control. In Sec. lIBI we use numerical 
simulations to identify the network characteristics that affect controllability, finding that only 
degree correlations have a discernible effect. In Sec. II CI we analytically derive nj^ = Nj^/N 
for random networks with a given degree distribution and correlation profile. More detailed 
calculations are provided in the Supplementary Information Sec. III. In Sec. lIDI we test our 
predictions on real networks. Finally, Sec. [TTl summarizes our results. 



I. RESULTS 



A. Prediction based on the degree distribution 



To motivate our study we compared the observed A^d to the prediction based on the degree 
sequence for several real networks. For this we randomize each network preserving its degree 
sequence and we calculate N^^'^, the number of driver nodes for the randomized network. 
Plotting A'^D versus N^^^'^ on log-log scale indicates that the degree sequence correctly predicts 
the order of magnitude of A^d despite known correlations |l9l . |20(| (Fig. [1^). However, by 
plotting nj) = Nj)/N versus ng'"*^ = N^'^'^/N we observe clear deviations from the degree 
based prediction (Fig. [Tb). Our goal is to understand the origin of these deviations, and the 
degree to which network correlations can explain the observed ud. 



B. Numerical simulations 



We start from a directed network with Poisson 29|, ISOJ or scale-free degree distribution 
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321 ]. The scale-free network is generated by the static model described in Sec. IIII A[ We 



use simulated annealing to add various network characteristics by link rewiring, while leaving 

the in- and out-degrees unchanged, tuning each measure to a desired value, for details see 

Sec. IIIIBI We computed no using the Hopcroft-Karp algorithm 

Clustering. We use the global clustering coefficient 



C 



27( 1 defined for directed networks as 
3 • number of triangles 



(1) 



2 ■ number of adjacent edge pairs 
The simulations indicate that changes in C only slightly alter wd and that the effect is not 



systematic (Fig. |2K). Hence we conclude that C plays a negligib 



Modularity. We quantify the community structure using 25 



e role in determining nr,. 



26|: 
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where A^^ is the adjacency matrix, c„ and c^ are the communities the v and w nodes belong 
to, respectively. Specifying Q still leaves a great amount of freedom in the number and size 



of the communities. We therefore choose to randomly divide the nodes into A^c equally sized 

groups, and increase the edge density within these groups, elevating Q to the desired value. 

The simulations indicate that this community structure has no effect on n-o (Fig. Wp)- 

While adding communities to networks can be achieved in many different ways, and the 



effect of modu 
nities 
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arity can be explored in more detail (e, 



&- 



lierarchical organization of commu- 



35j ]. overlapping community structure (24J. l36| . etc), we have failed to detect 
systematic, modularity induced changes in t^d, prompting us to conclude that Q does not 
play a leading role in riD. 

Degree correlations. In directed networks each node has an in-degree {k^} and an out- 
degree (fco); thus we can define four correlation coefficients: correlations between the source 
node's in- and out-degree, and the target node's in- and out-degree (Figs. EIH]) J28[. We use 
the Pearson coefficient to quantify each correlation with a single parameter: 



^Ee(^^"^-^)(ji'^- 



JW 



where ^^ ■ sums over all edges, a, /3 G {in, out} is the degree type, fc^"^ is the degree of the 
source node, j^^^ is the degree of the target node. And ja = -^ YlieJe i^ the average degree of 
the nodes at the beginning of each link, a^ = -| ^^ ( ke — k^^^ j is the variance; /c(^) and 
a^^'^ are defined similarly. 

Simulations shown in Figs. [3] and H] indicate that degree correlations systematically affect 
TT-D- We observe three distinct types of behavior: 

(i) no depends monotonically on r(°^*"™\ so that low (negative) correlations increase tt-d 
and high (positive) correlations lower riD (Figs. [3t, Ht); 

(ii) Both r*^™"™^ and 7-(out-°"t) increase riD, independent of the sign of the correlations 
(Figs.[li,[3li,|li,|li); 

(iii) r('°"°"*) has no effect on n^ (Figs. [3t, it). 

The behavior is qualitatively the same for Erdos-Renyi (Fig. [3]) and scale-free (Fig. H]) net- 
works. 



The diversity of these numerical results require a deeper explanation. Therefore in the 
remaining of the paper we focus on understanding analytically the role of degree correlations, 
which, by systematically altering ud, affect the system's controUabihty. 



C. Analytical framework 



The task of identifying the driver nodes can be mapped to the problem of finding a 



maximum matching of the network 10|. A matching is a subset of links that do not share 



start or end points. We call a node matched if a link in the matching points at it and we 



gain full control over a network if we control the unmatched nodes. The cavity method 
been successfully used to calculate the size of the maximum matching for undirected 



nas 



and directed [10| network ensembles with given degree distribution. Here we study network 
ensembles with a given degree correlation profile. 

We calculate n^ analytically for a given P{ki^, /^out) and selected degree-degree correlation 
e(jin, jout; kin, kout), representing the probability of a directed link pointing from a node with 
degrees jin and jout to a node with degrees /cin and fcout- In the absence of degree correlations 
(neutral case) 

where Q(°"*)(jo) = |yP^°"'HJo), Q^'"\kji= ^P^'^'^k) and (k) is the average degree. To 



(k) 



ensure analytical tractability we chose 21 1 



(fc>- 



e 
e 



^'"""Hji, Jo; k, k,) = Q(°^*)(jo)P(°"*H^o) [P('"nJi)Q^'"H^i) + r('"--)m('"--)(ji, k)] , (5a) 
^'""°"*Hji, Jo; h, k,) = g(°"*Hjo)Q^'"H^i) [P('"Hji)^^°"*n^o) + r('"-°"*)m('"-°"*)(ji, ko)] , (5b) 
e(°"*-'"Hji, Jo; h, ko) = p('")(ji)^^°"*H^o) [g(°"*nJo)Q^'"H^i) + r(°-*--)m(°-*--)(jo, h)] , (5c) 
g(out-out)(^.^^.^. ^.^ ^^^ ^ g('°)(ji)p(-)(A;i) [p(°"*)(jo)Q^°"*n^^o) + r(°-*-°"*)m(°"*-°-*)(jo, k,)] . 

(5d) 

By fixing m^°'~^\j,k) (a, (3 E {in, out}) we obtain a one parameter network ensemble char- 



)_^m(--^\j,k) = 


= )Jm("-'5)(j,fc) = 


j=0 


fc=0 


oo 




^W^(/3) \ ^ 


?fc-m("-^)(?,A;) = 1 



acterized by r^" '^^ where m^" ^\j, k) satisfies the constraints 

(6) 

_ (7) 

j,k=0 

and all elements of e^°'~^\j, k) are between and 1. 

Our goal is to understand the relation between ud and the degree correlation coefficient 
7-(a-/3)_ Assuming that r^"'^^ is small we treat the correlations as perturbations to the neutral 
case, discussing the impact of the four r*^""^) correlations separately. 

Out-in correlations: Using equation ( l5cl) and keeping the first nonzero correction we obtain 
(Supplementary Information Sec. III.): 

_(out-in) ^ —(0) _ ^(out-in)M [Mi{w2, I - Wi) + Mi(l - U^i, W2)\ , (8) 

where no'-'^'' is the fraction of driver nodes of the uncorrelated network; Wi and Wi only depend 



on P{kin, k 



out j 



3 



and 



M,{x,y) = Y, m(-"")(j,fc)x^-iy'=-i. (9) 

j,k=i 

Equation ([8]) predicts that tTb depends linearly on r(°"t-™)^ g^ prediction supported by 
simulations for small r(°"*"™) (Figs.|3t andS}:;). This behavior is also revealed by the equivalent 
problem of finding the maximum matching of graphs [lO|. For a node A with out-degree ko, 
by definition only one edge can be in the matching. If the remainder ko — 1 edges point to 
nodes with degree 1 (disassortative case), A inhibits them from being matched, so we have 
to control each of them individually, increasing nj). If the remainder k^ — 1 edges point to 
hubs (assortative case), these hubs are likely to be matched through another incoming edge, 
decreasing nj). 

Out- out correlations: The cavity method indicates that for out-out correlations the first 
nonzero correction is of order (^^(out-out)^ ; 

_(out-out) ^ —(0) ^ ^(out-out)2i^ [^(m)/(i _ ^^^M2iw2) + H'^°'''^\w2)M2{1 - W^)] , (10) 



where H'^°'\x) = J2T=iQ^°'^(^)-'^'' ^ (« G {in, out}) only depends on P(A;in, fcout) and 

p(out)(/) 



M,w^ ± '"'°°"°°"':;^:^r""- '' -'--'-' (") 



j,fc=l,«=0 

Equation fITII]) predicts that 72^(0"*-°^*) ^^Qgg j^q^ depend on the out-out correlation of the 
directly connected nodes, but only on the correlation between the second neighbors, hence 
its dependence is quadratic in r(°"*"°"*\ a prediction supported by numerical simulations 
(Figs. |3]i andHJi). Indeed, positive (negative) r*-""*""*^*) correlation between the immediate 
neighbors means that if node A has high out-degree, then node B is expected to have high 
(low) out-degree, and therefore C is likely to have high out-degree (Fig. \5^. That is, both 
positive and negative one-step out-out correlations induce positive two-step correlations, 
accounting for the symmetry of the effect observed in simulations (Figs. [3li andlHi). 

In-in correlations: Switching the direction of each link does not change the matching, but 
turns out-out correlations into in-in correlations. So tTd'"'" can be obtained by exchanging 
P^^Khn) and P*^°"*^(A;out) in equation (ITOl) . predicting again a quadratic dependence on r*^™"™\ 
supported by the numerical simulations (Figs. |3^ and|4^). 

In- out correlations: The equations for n^ do not depend on the in-degree of the source 
and the out-degree of the target of a link, hence we predict that r'^™""^*) does not play a role 
in network controllability, a prediction supported by the simulations (see Figs. Eb andHb). 

Taken together, we predict that the functional dependence of no on degree correlations de- 
fines three classes of behaviors, depending on the matching problem's underlying symmetries: 
riD has no dependence on r(™"°"*\ linear dependence on r*^°"*"™) and quadratic dependence on 
^(m-in) ^^^ ^(out-out) _ These predictions are fully supported by numerical simulations (Figs. H] 
andS]): for small r we see no dependence on 7-(in-°ut)^ g^j^ asymmetric, monotonic dependence 
on r(°ut-™)^ and a symmetric on r*^™"™^ and r(°"*"°"*^ 

To directly compare the analytical predictions to simulations we need to know the com- 
plete e(ji, jo! h, K) distribution, which is not explicitly set in the simulations in Sec. II B[ So 
to test the results we use a rewiring method that sets the e{ji, jo', h, k^) distribution, not 



only the r correlation coefficient 2l| . This method is not as robust as our original algorithm 



and the range of accessible r values is more restricted. However, since our results are based 



on perturbation scheme we only expect them to be correct for small r values. Indeed, we 
find that the predictions quantitatively reproduce the numerical results in a fair interval of 
r("-« (Fig.E]). 

D. Real networks 

We test the predictions provided by the developed analytical and numerical tools on a set 
of publicly available network datasets. When complex systems are mapped to networks, the 



links connecting the nodes represent interactions between them. In this context self- 
represent self-interactions, with a strong, well understood impact on controllability 
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oops 



38 



While in some systems self-loops are obvious 
are manifestly absent (e.g. electric circuits 



y present (e.g. neural networks), in others they 



39|). Our purpose here is to test the effect of 



correlations, hence we rely on datasets that capture the wiring diagram of various complex 
systems with different correlation properties. Therefore, even if in a few of these maps self- 
loops are missing, it is beyond the scope of this work to complete these networks. However, 
when studying controllability of a particular system, careful thought has to be put into 
whether self-loops are present or not. We present a systematic study on the effect of self- 
loops in Supplementary Information Sec. II. B. 

To test the impact of our predictions on real networks we calculate 

N^^-NS:^ (12) 

where A'^^^'^'^ represent the number of driver nodes for the degree-preserved randomized ver- 
sion of the original network. Hence if A = then P{ki^, A^out) accurately determines Nd; if 
A 7^ then the structural properties not captured by the degree sequence influence its con- 
trollability. We measure the correlations in several real networks and based on our numerical 
and analytical results we predict the sign of A (Fig. [7]). We grouped the networks according 
to our predictions. We provide the details of each network dataset in the Supplementary 
Information Table SI. 

Group A. The networks of p2p Internet (Gnutella fllesharing clients) do not have strong 
correlations, therefore we expect wd to be correctly approximated by the prediction based 



on P{kia, kout) (i-e. A ^ 0), in line with the empirical observations. 

Group B. As in most networks the three relevant correlations coexist to some degree 
(Fig. [7]), it is impossible to isolate their individual role. Yet, the networks in this group (elec- 
tric circuits, metabolic networks, neural networks, power grids and food webs with exception 
of the Seagrass network) all have negative out-in and nonzero in-in and out-out correlations, 
each of which individually increase no as we showed above. Therefore we predict A > 0, in 
line with the empirical observations. 

Group C. Only the prison social-trust and the cell phone network feature significant pos- 
itive out-in correlations. These networks also display nonzero in-in and out-out correlation, 
leading to the coexistence of two competing effects: out-in correlations decrease tt-d and the 
out-out and in-in correlations increase nj^. Since the out-in correlation is a first order effect 
(equation ([8])), while out-out and in-in correlations are only of second order (equation (fTOj) ). 
we expect a decrease in ud (i.e. A < 0), consistent with the empirical results. 

Group D. The Seagrass food web and citation networks do not feature significant out- 
in correlations, only the secondary in-in and out-out correlations, hence we expect nu to 
increase (A > 0), consistent with the observations. 

Group E. Only the transcriptional regulatory networks are somewhat puzzling in that 
they show degree correlations, yet the degree sequence still correctly gives n-Q. However, 
the simulations indicated that the effect of correlations is negligible for high hd. And our 
analytical results showed that the value of the correction depends on details of e{ji,jo', h, ko), 
not captured by the Pearson coefficient r. These observations highlight that even though in 
most cases our qualitative predictions based on r are valid, in some cases further investigation 
is required. 

II. DISCUSSION 

The goal of our paper was to clarify the higher order network characteristics that in- 
fluence controllability. We studied the effect of three topological characteristics: clustering, 
modularity and degree correlations. We used numerical simulations to identify the role of 
the relevant characteristics, finding that changes in the clustering coefficient and the com- 
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munity structure have no systematic effect on the the minimum number of driver nodes ud- 
In contrast degree correlations showed a robust effect, whose magnitude and direction de- 
pends on the type of correlation. Using the cavity method we derived tzd for networks with 
given degree distribution and correlation profiles, finding results that are consistent with our 
numerical simulations. For real networks these numerical and analytic results enabled us to 
qualitatively explain the deviation of the observed ud from the prediction based only on 

r[kiYi, fcout j- 

Our results not only offer a new perspective on the role of topological properties on 
network controllability, but also raise several questions. Future research directions include 
determining the optimal network structure to minimize the number of necessary driver nodes, 
and studying how different network characteristics influence the robustness of the control 
configuration. 

III. METHODS 

A. Generating a scale-free network 



We use the static model to generate directed scale-free networks J40[. We start from A^ 
disconnected nodes and assign a weight Wi = {i + io)~" to each node i {i = 1 . . . N). We 
randomly select two nodes i and j with probability proportional to Wi and Wj respectively and 
if they are yet not connected, we connect them. We allow self-loops, but avoid multi-edges. 
We repeat the process until L links have been placed. The resulting network has average 
degree (k) = 2L/N, and p(™/°"*)(A;) ~ k~"' for large k, where 7 = 1 + -, and maximum 
degree /c^ax ~ h"- 

To systematically study correlations, the starting network has to be uncorrelated. How- 



ever, the presence of hubs may induce unwanted degree correlations j41|, and rnay also 
considerably limit the maximum and minimum correlations accessible via rewiring 42|. We 
overcome these difficulties by introducing a structural cutoff in the degrees, choosing io to 
ensure fcmax < {{k)N) [43[. Note, that in the static model of Goh et al. zn = |40|. 



As both in- and out-degree of node i is proportional to Wi, the above procedure results 

10 



in correlations between the in- and out-degrees of node i. To eliminate the correlations, we 
randomize the in-degree sequence while keeping the out-degree sequence unchanged. 



B. Rewiring algorithm 



We use degree preserving rewiring 20| to add each network characteristic. Suppose that 
the chosen network characteristic is quantified by a metric X. To set its value to X*, we 
define the E{X) = \X — X*\ energy, so E{X*) is a global minimum. We minimize this energy 



by simulated annealing [4^: (1) choose two links at random with uniform probability; (2) 
rewire the two links and calculate the energy E{X) of the resulted network; (3) accept the 
new configuration with probability 

{1, if AE < 

(13) 
e-^^^ ifAE>0, 

where the /3 parameter is the inverse temperature; (4) repeat from step one and gradually 
increase (3. Stop if \E{X) — E{X*)\ is smaller than a predefined value. 

Note, that keeping the degree sequence bounds the possible values of X that can be 
reached by rewiring. In all cases we study the full interval of accessible X values. 
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FIG. 1. (a) We compare A'd for real systems to N^ , representing the number of driver nodes 
needed to control their randomized counterparts. Randomization eliminates all local and global 
correlations, only preserving the degree sequence of the original system. We find that the degree 
sequence predicts the order of magnitude of A'^d correctly, however, small deviations are hidden 
by the log scale, needed to show the whole span of A^d seen in real systems, (b) These deviations 
are more obvious if we compare the density of driver nodes no = N-q/N and ng"^'^ in linear scale, 
finding that for some systems (e.g. regulatory and p2p Internet networks) the degree sequence 
serves as a good predictor of nD, while for other systems (e.g. metabolic networks and food webs) 
TiD deviates from the prediction based solely on the degree sequence. 
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FIG. 2. Effect of the clustering coefficient C and modularity Q on the density of driver nodes, ud. 
Network size is A^ = 10, 000. Each data point is an average over 50 independent runs; the error 
bars, typically smaller than the symbol size, represent the standard deviation of the measurements. 
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FIG. 3. The impact of degree-degree correlations on the density of driver nodes (no) for the Erdos- 
Renyi model {N = 10,000) for average degrees (k) = 1 (red), (A;) = 3 (green), (/c) = 5 (blue), 
(A;) = 7 (black) and (A;) = 9 (orange). The results are similar for the scale- free model (see Fig. U]). 
Each data point is an average of 100 independent runs. 
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FIG. 4. The impact of degree-degree correlations on the density of driver nodes (^d) for the scale- 
free model (A^ = 10,000, 7 = 2.5) for average degrees (A;) = 1 (red), (k) = 3 (green), (A;) = 5 (blue), 
(A;) = 7 (black) and (k) = 9 (orange). The results are similar for the Erdos-Renyi model (see Fig. [3]). 
Each data point is an average of 100 independent runs. 
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FIG. 5. One-step out-out correlations induce positive two-step correlation. Positive (negative) cor- 
relation between neighboring nodes means that if node A has high out-degree, then node B is likely 
to have high (low) out-degree, and hence C will likely have high out-degree. 
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FIG. 6. The analytic formulas are tested with simulations on an (a) Erdos-Renyi model and on 



2ll to set e("-'')(ii,jo;A:i,A;o). For (a) 



a (b) scale-free model. We used the algorithm proposed in 
network we choose A^ = 1,000 and (fc) = 3; for (b) N = 1,000, 7 = 2.5 and (fc) = 4. Each data 
point is an average over 100 independent runs; the errors represent by the standard deviation of 
the measurements. 
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FIG. 7. The observed and predicted deviation between Nd and N^ . Red line: A = 
(A^d — A^Q ) /N, the prediction error based on the degree sequence. Dashed hnes: correlations 
relevant to controllability. For each network A is calculated by averaging over 50 independent 
configurations. 
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